Memory abstractions for parallel programming
نویسنده
چکیده
A memory abstraction is an abstraction layer between the program execution and the memory that provides a different “view” of a memory location depending on the execution context in which the memory access is made. Properly designed memory abstractions help ease the task of parallel programming by mitigating the complexity of synchronization or admitting more efficient use of resources. This dissertation describes five memory abstractions for parallel programming: (i) cactus stacks that interoperate with linear stacks, (ii) efficient reducers, (iii) reducer arrays, (iv) ownershipaware transactions, and (v) location-based memory fences. To demonstrate the utility of memory abstractions, my collaborators and I developed Cilk-M, a dynamically multithreaded concurrency platform which embodies the first three memory abstractions. Many dynamic multithreaded concurrency platforms incorporate cactus stacks to support multiple stack views for all the active children simultaneously. The use of cactus stacks, albeit essential, forces concurrency platforms to trade off between performance, memory consumption, and interoperability with serial code due to its incompatibility with linear stacks. This dissertation proposes a new strategy to build a cactus stack using thread-local memory mapping (or TLMM), which enables Cilk-M to satisfy all three criteria simultaneously. A reducer hyperobject allows different branches of a dynamic multithreaded program to maintain coordinated local views of the same nonlocal variable. With reducers, one can use nonlocal variables in a parallel computation without restructuring the code or introducing races. This dissertation introduces memory-mapped reducers, which admits a much more efficient access compared to existing implementations. When used in large quantity, reducers incur unnecessarily high overhead in execution time and space consumption. This dissertation describes support for reducer arrays, which offers the same functionality as an array of reducers with significantly less overhead. Transactional memory is a high-level synchronization mechanism, designed to be easier to use and more composable than fine-grain locking. This dissertation presents ownership-aware transactions, the first transactional memory design that provides provable safety guarantees for “opennested” transactions. On architectures that implement memory models weaker than sequential consistency, programs communicating via shared memory must employ memory fences to ensure correct execution. This dissertation examines the concept of location-based memory fences, which unlike traditional memory fences, incurs latency only when synchronization is necessary. Thesis Supervisor: Charles E. Leiserson Title: Professor
منابع مشابه
Concurrent Algorithms for Emerging Hardware Platforms
of “ Concurrent Algorithms for Emerging Hardware Platforms ” by Irina Calciu, Ph.D., Brown University, May 2015 Computer architecture has recently seen an explosion of innovation that has enabled more parallel execution, while parallel software systems have been making strides in providing more simplified programming models. The number of computing cores used in every area of the software ecosy...
متن کاملIdentifying a Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-language Virtual Machines
Supporting all known abstractions for concurrent and parallel programming in a virtual machines (VM) is a futile undertaking, but it is required to give programmers appropriate tools and performance. Instead of supporting all abstractions directly, VMs need a unifying mechanism similar to INVOKEDYNAMIC for JVMs. Our survey of parallel and concurrent programming concepts identifies concurrency a...
متن کاملReducing the complexity of debugging parallel REPLICA programs with pluggable abstraction patterns
Traditional debuggers focus on a single thread at a time or are better suited for concurrent programming with a low number of interacting threads and/or distributed memory, making it hard to monitor a massively data-parallel program on a shared memory multi-core system. This work considers a globally step-synchronous model of computation. Compared to contemporary multi-core processors with inde...
متن کاملAbstractions for Parallel N - body Simulations
ions for Parallel N-body Simulations (Extended Abstract) Sandeep Bhatt Marina Chen Cheng-Yee Lin Pangfeng Liu Department of Computer Science Yale University New Haven, CT 06520 Abstract This paper introduces C++ programming abstractions for maintaining load-balanced partitions of irregular and adaptive trees. Such abstractions are useful across a range of applications and MIMD architectures. Th...
متن کاملExplicit Management of Memory Hierarchy
All scalable parallel computers feature a memory hierarchy, in which some locations are “closer” to a particular processor than others. The hardware in a particular system may support a shared memory or message passing programming model, but these factors effect only the relative costs of local and remote accesses, not the system’s fundamental Non-Uniform Memory Access (NUMA) characteristics. Y...
متن کامل